-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: store event bloom filter in DB #473
base: main
Are you sure you want to change the base?
Conversation
ca2b119
to
42e4668
Compare
42e4668
to
f5ef319
Compare
de25cf7
to
cff833c
Compare
#[cfg(test)] | ||
#[test] | ||
fn test_column_all() { | ||
assert_eq!(Column::ALL.len(), Column::NUM_COLUMNS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really useful since NUM_COLUMNS
is already defined as Self::ALL.len();
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just in case of update
crates/madara/client/rpc/src/versions/user/v0_7_1/methods/read/get_events.rs
Outdated
Show resolved
Hide resolved
crates/madara/primitives/bloom_filter/benches/bloom_benchmark.rs
Outdated
Show resolved
Hide resolved
fn bench_bloom_filters(c: &mut Criterion) { | ||
let test_data = generate_test_data(TEST_DATA_SIZE); | ||
|
||
// Test different hash functions | ||
let hashers = [ | ||
get_hasher_benchmarks::<DefaultHasher>("DefaultHasher"), | ||
get_hasher_benchmarks::<AHasher>("AHasher"), | ||
get_hasher_benchmarks::<XxHash64>("XxHash64"), | ||
]; | ||
|
||
for hasher in &hashers { | ||
(hasher.sequential_insertion)(c, &test_data, hasher.name); | ||
(hasher.parallel_insertion)(c, &test_data, hasher.name); | ||
(hasher.lookup)(c, &test_data, hasher.name); | ||
(hasher.parallel_lookup)(c, &test_data, hasher.name); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really cool testing structure!
/// where: | ||
/// - k is the number of hash functions | ||
/// - p is the desired false positive rate (0.01) | ||
/// Reference: [Bloom, B. H. (1970). "Space/Time Trade-offs in Hash Coding with Allowable Errors"](https://dl.acm.org/doi/10.1145/362686.362692) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
pub const HASH_COUNT: u8 = 7; | ||
|
||
// Number of bits per element in the Bloom filter. | ||
// The value 9.6 is optimal for a false positive rate of 1% (0.01). | ||
const BITS_PER_ELEMENT: f64 = 9.6; | ||
|
||
pub const FALSE_POSITIF_RATE: f64 = 0.01; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These constants appear several times in the codebase, maybe we want to define them only once and re-use them?
7b985e6
to
56a51cc
Compare
Pull Request type
Please add the labels corresponding to the type of changes your PR introduces:
What is the current behavior?
Currently, the event pagination mechanism does not correctly handle the continuation token
Additionally, event filtering does not leverage a bloom filter, which could improve performance by reducing unnecessary block lookups.
Resolves: #NA
What is the new behavior?
Integration of
mp-bloom-filter
for event filtering:mp-bloom-filter
crate to improve filtering efficiency.Optimized event retrieval:
take(chunk_size + 1)
in the filtering process to determine whether the current block has more events.Does this introduce a breaking change?
Yes
Other information